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Bayes*  Theorem  And  The  Use  Of  Prior  Knowledge  In 
Regression  Analysis 

George  C.  Xiao  and  Arnold  2e liner 


I.  INTRODUCTION 

The  use  of  Bayes'  theorem  in  statistical  inference  has  recently 
been  reconsidered  in  the  works  of  Jeffreys  (1957,  1961),  Savage  (1959, 
1961,  1962),  Raiffa  and  Schlaifer  (1962),  Box  and  Tiao  (1962,  1963)  and 
others.  Emerging  from  these  works  are  what  we  consider  to  be  at  least 
two  distinct  advantages  of  the  Bayesian  approach.  First,  this  approach 
provides  an  excellent  framework  for  the  systematic  and  logical  assessment 
of  the  adequacy  of  the  assumptions  which  are  used  in  many  statistical 
models.  Examples  which  illustrate  this  use  of  the  approach  may  be  found 
in  the  works  of  Box  and  Tiao  in  which  the  effects  of  certain  departures 
from  normality  are  assessed  in  making  inferences  about  location  and  scale 
parameters.  Second,  given  that  a  model  is  adequate,  the  Bayesian  approach 
la  one  in  which  prior  knowledge  about  parameters  of  interest  can  be 
combined  in  a  well-defined  mathematical  way  with  information  obtained 
from  an  experiment.  Such  prior  knowledge,  which  may  arise  from  general 
theoretical  considerations  and/or  the  results  of  previous  or  concurrent 
experiments,  is  usually  an  important  component  of  an  investigator's  quest 
for  understanding.  In  this  paper  we  illustrate  how  prior  knowledge  can 
be  utilized  in  conjunction  with  sample  information  in  making  inferences 
about  the  parameters  of  the  regression  model,  a  model  which  is  used 
extensively  in  many  areas  of  research. 

The  plan  of  the  paper  is  as  follows.  In  Section  2,  we  review 
several  Bayesian  analyses  of  the  regression  model  which  have  appeared  in 
the  literature  and  go  on  to  develop  two  additional  models  which  we  believe 
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have  desirable  features  not  found  in  other  models.  Some  technical  results 
needed  to  implement  the  models  in  practice  are  presented  in  Section  3. 

Then  in  Section  4  we  apply  our  methods  in  the  analysis  of  investment 
data  relating  to  two  large  corporations.  Finally,  in  Section  5  we 
provide  a  summary. 

II.  BAYESIAN  ANALYSIS  OF  THE  REGRESSION  MODEL 

2.1  Specification  of  the  Model 

We  employ  the  Bayesian  approach  to  make  inferences  about  a 
regression  coefficient  vector  p'  *  (p1,  p2,  . pp).  This  vector 
of  coefficients  appears  in  the  usual  regression  model  as  follows: 

(2.1)  y  =  XP  +  e 

where  y  is  a  Txl  vector  of  observations,  X  is  a  Txp  matrix  of  fixed 

elements  with  rank  p,  and  e  is  a  Txl  vector  of  random  disturbances.  We 

assume  that  the  elements  of  e  are  normally  and  independently  distributed, 

2 

each  with  mean  zero  and  unknown  variance  o  .  Under  these  assumptions  our 
Joint  likelihood  function  is: 

(2.2)  i(p,  a|y)  =  (l/c'/fcOT  exp  j-(l/2c2)(y  -  Xp)  •  (y  -  Xp)|  . 

For  simplicity  in  notation  we  shall  use  the  symbol  Q(p,  q,  A)  throughout 
this  paper  to  denote  a  quadratic  form  in  variables  p  centered  at  q  and 
with  matrix  A,  namely 

Q(P»  t}.  A)  =  (p  -  T))'  A  (p  -  tj). 

In  this  notation,  the  likelihood  function  can  be  written: 

(2.3)  i(p,  a|y)  =  (“^z)  exp  {“  tv«2  + 
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where  2  =  X'X,  £  =  z“ *  X'y,  v  -  T-p  and  s2  =  ^-(y  -  Xp )’(y-X{L )• 

Using  Bayes*  theorem,  the  likelihood  function  in  (2. 3)  is  combined  with  a 

prior  distribution  p(p,v  )  of  the  parameters  and  <r  to  yield  a  joint  posterior 

distribution  p(p,  o-  1  y)  for  these  parameters,  that  is 

(2.4)  p(P,ff  ly)  =  Kp{p,(r)  l(p,ff  |y) 

where  K”1  =  /p(P,  <r )  KP,  <r  |y)  dpd<r, 

R 

From  the  joint  posterior  distribution  of  (-  and  or  ,  we  can  then  derive  marginal 
and  conditional  posterior  distributions  for  or  and  for  particular  elements  of  p. 

Clearly  the  form  of  our  posterior  distribution  will  depend  on  the  kind  of  prior 
information  which  we  have  available  and  the  way  in  which  we  represent  it. 

In  what  follows,  we  consider  several  formulations  which  have  appeared  in  the 
literature  and  then  go  on  to  present  and  analyze  two  models  which  we  have 
developed. 

2. 2  Locally  Uniform  Prior  Distributions 

In  problems  involving  estimation  of  location  and  scale  parameters,  it  has 

been  argued  in  several  previous  works --Jeffreys  (1961),  Savage  (1961),  Box  and 

Tiao(1962)--  that,  in  many  practical  situations,  it  is  appropriate  to  use  Bayes' 

theorem  with  the  assumption  that  the  location  parameters  and  the  logarithm  of 

the  scale  parameters  are  independent  and  have  locally  uniform  prior  distributions. 

By  a  locally  uniform  prior  distribution  we  mean  a  distribution  function  which  is 
practically  uniform  over  the  region  in  which  the  likelihood  function  assumes 
appreciable  values,  and  at  no  other  point  is  it  of  sufficiently  great  magnitude  as 
to  become  appreciable  when  multiplied  by  the  likelihood.  When  such  prior 
distributions  are  employed,  the  posterior  distribution  of  the  location  parameters  and 
the  logarithm  of  the  scale  parameters  is  closely  approximated  by  the  likelihood 


function,  la  the  context  of  the  present  problem,  since  the  p  ^s  are  location  ** 
parameters  and  <r  is  a  scale  parameter,  we  have  then: 

(2.5a)  p(0)  a  ^ 

(2.5b)  p(log  0)  oe  k2  or  p(o)  a  ^ 


Substituting  (2.3)  and  (2.5)  in  (2.4),  the  joint  posterior 
distribution  of  0  and  0  is: 

(2.6)  •  p(0,  0 | y)  =  const.  a‘^T+1^  exp  [vs2  +  Q(0,  0,  Z)] 

2a 

This  posterior  distribution  can  be  written  as 


p(3>  °|y)  =  p(ff|y)  P(3 jcr,  y) 

where 

(2.7) 

and 

,  1  v  -(X-p+1) 

p (0  J  y)  =  const .  0  exp 

t  2  a2^ 

(2.8) 

p  0  j  a ,  y)  =  const.  0  p  exp 

-h  Q(3,  3.  Z)1 
2az  J 

We  see  that  (2.7)  is  in  the  form  of  an  "inverted"  gamma  distribution  and 

A 

(2.8)  is  a  multivariate  normal  distribution  with  mean  0  and  covariance 
2-1 

matrix  a  Z  . 


When  0  is  unknown,  the  marginal  posterior  distribution  of  0  is 
obtained  by  integrating  the  joint  posterior  density  function  over  0,  that  is, 


(2.9) 


« 

p(3|y)  =  Jp(e,  o|y)  do 


const .  jl  +  Pi  Q  1 


V+P 

2 


By  taking  t^  =  (0*  -  0*)  /  s(aii)^  and  r^  =  z^  (zil  z^)^  ,  we  obtain 

Z  rij  t 

,  i  i  4  *  j 

(2.10) 


f  ii  r  ci  ti  1 
p(t)  *■  const,  -jl  +  • 1  J|  •  **  j 


-  1+E 
2 


which  is  a  multivariate  t  distribution,  a  result  derived  by  Savage  (1961) 
using  the  Bayesian  approach.  It  can  easily  be  shown  that  the  marginal 
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posterior  distribution  of  a  subset  of  the  elements  of  {1  is  also  in  the 
same  form  as  in  (2.9)  and  can  therefore  be  transformed  into  a  multivariate 
t  distribution.  In  addition,  the  marginal  distribution  of  the  quantity 
is  simply  a  univariate  t  distribution  with  T-p  degrees  of  freedom. 

We  note  that  these  results  can  also  be  derived  from  Fisher's 
fiducial  theory.  Further,  from  the  sampling  theory  point  of  view,  the 

A 

statistics  0  and  s  are  regarded  as  random  variables.  The  distribution 
in  (2.10)  is  then  precisely  the  joint  distribution  of  the  quantities 
t^  *  (P*  -  (3*)  /  s(zii)^  ,  i  s  1,  2,  . ..,  p,  as  shown  by  Cornish  (1954) 
and  by  Dunnet  and  Sobel  (1954).  There  is,  of  course,  nothing  new  in  the 
above.  We  record  these  results  as  an  introduction  to  the  more  general 
models  which  we  present  below. 

2.3  Normal -Gaiana  Representation  of  Prior  Distributions 

In  situations  where  some  prior  information  about  the  parameter  p 

is  available,  we  can  take  as  our  joint  prior  distribution  for  fS  and  a 

1 k 

certain  scale  parameter  o^: 


(2.11) 

P(P»  d1) 

-  P(aL)  p(£3ja1) 

where 

2 

(2.12) 

pO^) 

,  -(v,+l) 

*  const .  '  1  exp 

f  V I 

and 

(2.13) 

pokj) 

«*  const.  a.P  exp  -A* 

1  l  2aT 

■  QO>  5.  *l)j 

2  *• 

The  quantities  v^,  s?  and  the  elements  of  {3  and  Z^  are  all  known  constants; 
the  matrix  Z^  is  assumed  to  be  non-negative  definite.  The  prior  distribution 

jr 

The  reasons  for  Introducing  a.  will  be  made  clear  in  the 
following  discussion  where  various  models  are  considered. 
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a 


in  (2.11)  is  called  a  "normal -gamma"  distribution  by  Raiffa  and  Schlaifer 
(1961)  and  is  seen  to  be  in  the  same  fora  as  the  posterior  distribution 
of  3  and  ?  in  (2.6).  It  can  be  used,  for  instance,  when  experiments 
are  conducted  sequentially  and  the  posterior  distribution  of  the  parameter (s) 
of  previous  experiments  are  taken  as  the  prior  distribution  for  the 
current  experiment.  Suppose  the  likelihood  function  of  our  previous 
experiments  takes  the  form: 

(2.14)  l(p,  ajy^  =  (0j_  </2*)-Tl  exp  j-  (1/2  oj)  (y^p) '  (y^p)  j. 

Then,  upon  making  similar  assumptions  about  the  prior  distributions  for 
p  and  as  discussed  in  Section  2.2,  and  by  setting 

V  xixr  p  =  zilx[yi*  vi =  Vp  and  si =  ^7  <yrxip>'<yrxip>. 

we  find  that  the  posterior  distribution  of  p  and  is  precisely  that 
given  in  (2.11). 

In  taking  p(p,  7^)  in  (2.11)  as  the  prior  distribution  to  be 
combined  with  the  likelihood  function  in  (2.2),  we  immediately  see  that 
the  exact  form  of  the  posterior  distribution  of  p  will  depend  upon  our 
knowledge  about  the  relationship  between  the  scale  parameter  a  which 
appears  in  (2.2)  and  the  new  scale  parameter  7^  introduced  in  (2.11). 

In  what  follows  we  distinguish  three  different  situations:  (i)  is 
functionally  related  to  a;  (ii)  o^  is  known  to  take  some  fixed  value 
7^q  and  is  independent  of  o;  and  (iii)  7^,  is  unknown  and  independent 


of  o. 
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2.4  Situation  Where  and  q  are  Functionally  Related 
Raiffa  and  Schlaifer  (1961)  have  considered  the  case  in  which 
is  proportional  to  a  with  a  known  factor  of  proportionality,  that  is, 

«  ko  with  the  value  of  k  fixed.  Since  k  is  known,  there  is  no  loss 
in  generality  to  assume  that  k  ■  1  so  that  q^  »  0.  This  assumption  is 
appropriate,  for  example,  in  situations  in  which  experiments  are  conducted 
sequentially  under  well  controlled  conditions  which  Insure  constancy  of 
the  variances  of  random  disturbances  in  all  experiments.  The  prior 
distribution  of  0  and  in  (2.11),  which  can  be  regarded  as  the  posterior 
distribution  of  these  parameters  arising  from  previous  experiments,  then 
provides  a  priori  information  for  both  the  parameters  0  and  the  scale 
parameter  a.  When  this  prior  distribution  is  employed  in  conjunction  with 
the  likelihood  function  in  (2.2),  the  joint  posterior  distribution  of  p 
and  q  is  given  by: 


(2.15)  p(p,  a|y)  =  p(ajy)  p(p|q,  y) 
where 


2  2 
vs  +  v,  s. 


.  |  .  -fv.+T+n  r  V9+Vi8n 

p(0  y)  =  const,  a  1  exp  -J  -  —————  v 

1  2a  > 

p(p|<J,  y)  =  const.  a"P  exp  Q(P,  p,  Z^j- 

Z2  =  Z  +  Z1  and  0=  (Zp  +  0) . 

On  integrating  out  a  from  (2.15),  we  obtain  the  posterior  distribution  of  p, 

,  r  Q(P,  P,  z2h  <p+p> 

(2,16)  p<3 1 y)  ■  const.  |l  + - 7-75 - j- 

•  -2122 
with  v  ■  Vj^  +  T  and  s  “  j  (vi  8i  +  vs  ) • 


This  distribution  is  in  the  same  form  as  that  given  in  (2.9)  and  can  be 
transformed  into  a  multivariate  t  distribution  as  indicated  above. 
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2 . 5  Situation  in  Which  is  Known 

In  many  circumstances,  as  Theii  (1962)  has  pointed  out,  the 

assumption  that  and  a  are  functionally  related  is  inappropriate.  For 

instance,  in  econometric  analysis  it  is  frequently  the  case  that  theoretical 

considerations  may  lead  the  investigator  to  impose  certain,  perhaps 

imprecise,  a  priori  restrictions  on  the  value  of  6-  The  conditional 

prior  distribution  of  (3  in  (2.13)  for  some  assigned  value  of  cr^,  say 

a  a  may  be  utilized  as  a  mathematical  representation  of  these 

a  priori  restrictions  with  the  assigned  measuring,  in  some  sense, 

★ 

the  investigator's  uncertainty  about  them.  Since  o^  is  now  regarded  as 
a  measure  of  subjective  feelings,  whereas  a  in  the  likelihood  function  is 
a  measure  of  experimental  error,  there  is  little  reason  for  supposing 
that  they  are  functionally  related.  Thus,  assigning  the  value  to  o^ 
provides  us  with  no  information  about  0.  We  may  then  follow  the  analysis 
in  Section  2.2  and  take  log  cr  to  be  locally  uniformly  distributed 
a  priori.  With  these  assumptions,  the  posterior  distribution  of  p  is: 


■  •17)  P  (3 1  y)  =  k  1  exp  j-  ~  Q(3  >  P,  zpj  jl  +  Q(P’ 

t  2o;  -  l  vs 


where 


r  r  i  >  i  q<3 »  p,  z) v 

■J  exp  {'  73  >■  V}  {* +  — | 


This  posterior  distribution  is  seen  to  be  in  the  form  of  the  product  of  a 
multivariate  normal  distribution  and  a  multivariate  t  distribution. 
Hereafter,  we  shall  denote  a  distribution  of  this  type  as  a  multivariate 


"normal-t"  distribution.  We  note  that,  when  v  tends  to  infinity,  the 

expression  jl  +  Q(  g’2^~Z— ]  '  tends  to: 

L  v  a  ) 


tends  to: 


*In  addition^to  assigning  a  value  to  a^,  it  is  of  course  necessary 
to  assign  values  to  p  and  the  matrix  Z ^  in  (2.13). 
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v+p 


(2.18)  lim  jl  +  -- r ~ |  2  =  exp  j-  Q(0,  0,  Z)|  . 

V-»o»  (.  Vs  >  l  28  1 

Thus,  In  the  limit,  we  have  for  the  posterior  distribution  of  0  in  (2.17), 


(2.19)  lim  p (3 j y)  =  exp  j-  |  Q(0,  0,  C)| 

(2*)1 

with  C  =  Zl  +  Z  and  0  =  C_1  (--  Zl  0  +  Z  0) 
al  8  s 

which  is  a  multivariate  normal  distribution  with  mean  0  and  covariance 
matrix  C  For  finite  values  of  v,  the  normalizing  constant  k  in  (2.17) 
is  a  p-diraensional  integral  which  cannot  be  expressed  in  terms  of  simple 
functions.  Nevertheless,  it  can  be  approximated  using  methods  similar  to 
those  described  in  Section  3. 

Before  leaving  this  section,  we  shall  make  a  few  remarks  about  the 
work  of  Theil  (1962)  and  Theil  and  Goldberger  (1960)  in  connection  with 
the  use  of  prior  knowledge  in  regression  analysis.  Theil  and  Goldberger 
are  primarily  interested  in  utilizing  prior  information  about  0  in 
conjunction  with  a  sample  to  provide  a  point  estimate  of  0  which  in¬ 
corporates  both  prior  and  sample  information.  In  their  treatment,  the 
regression  model  is  specified  as  that  given  in  (2.1)  except  for  the  normality 
assumption,  that  is 


y  =  X0  +  e 

with  E(e)  =  0  and  E(ee')  =  I<j2. 


The  prior  information  about  0  can  be  put  in  the  form: 

(2.20)  yl  =  X1  0  +  el 

where  the  elements  of  are  independently  distributed,  each  with  zero 

2  2 

mean  and  known  variance  Further,  is  assumed  to  be  functionally 
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2 

independent  of  o  .  From  the  sampling  theory  point  of  view,  they  show 
2 

that,  when  a  is  known  the  statistic 

(2.21)  £  >  x'x  +  ~  xjx^*1  <-§  x’y +  “2  xiyi>» 

a  o 

2 

is  the  minimum  variance  linear  unbiased  estimator.  In  case  a  is  not  known, 
2  2 

Theil  substitutes  s  ,  the  sample  variance,  for  a  in  (2.21)  and  proceeds 
to  show  that  the  resulting  statistic  f3,  given  by 

(2.22)  5  »  (“|  X'X  +  ~  X’xp"1  (-—  X'y  +  X^), 

8  al  s  ox 

V  •'  *-  .1 

differs  from  p  by  a  quantity  which  is  of  order  T  in  probability. 

It  may  be  of  interest  to  observe  the  parallelism  of  the  above 

results  and  those  from  the  Bayesian  formulation  we  have  considered.  When 

a  normality  assumption  is  added,  the  likelihood  function  corresponding  to 

(2.20)  is  proportional  to  the  expression  given  in  (2.14).  For  the  case 
2 

a  known,  it  can  readily  be  shown  that  the  posterior  distribution  of  p 

is  multivariate  normal  with  mean  given  by  the  expression  in  (2.21).  In 
2 

the  case  where  a  is  not  known,  the  expression  in  (2.22)  is  precisely 
the  limiting  mean  for  the  multivariate  “normal-t"  distribution  as  v  tends 
to  infinity  t see  equation  (2.19)].  This  result  is,  of  course,  to  be 
expected  since,  except  for  the  normality  assumption  about  the  disturbances, 
all  other  underlying  assumptions  are  very  much  the  same  in  both  approaches. 

2 . 6  Situation  in  Which  is  Regarded  as  a  Variable  Parameter 
and  Independent  of  q1 

We  have  considered  two  models  above,  one  in  which  it  is  assumed 
that  q^  ■  kq  with  k  known,  and  the  other  in  which  q^  is  independent  of  a 
but  takes  on  a  fixed  value  q^.  As  a  generalization  of  the  second  model, 
we  now  consider  q  and  o^  to  be  Independent  variable  parameters. 
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This  formulation  will  be  applicable,  for  example,  in  the  following 
situation.  Suppose  that  the  results  of  two  sets  of  experiments  are 
utilized  to  make  inferences  about  3  and  that  the  associated  likelihood 
functions  are  given  by  £(3,  a^jy^)  (2.14)  and  £(3,  a|y)  in  (2.2), 
respectively.  Suppose  further  that  these  two  sets  of  experiments  are 
carried  out  under  quite  different  conditions  so  that  there  is  no  basis  for 
assuming  any  relationship  between  and  a.  Following  the 
discussion  in  section  2.  3  .it  seems  appropriate  to  take  the  normal-gamma 
distribution  p(3,  a^)  in  (2.11)  as  the  posterior  distribution  associated 
with  the  first  set  of  experiments  (see  discussion  in  Section  2.3).  This 
distribution  can  then  be  regarded  as  representing  prior  information  about 
3  and  <j^  for  the  analysis  of  the  second  set.  Since  and  a  are  independent, 
information  about  represented  by  the  marginal  distribution  p(cr^)  in 
(2.12)  contributes  nothing  to  the  investigator's  knowledge  about  a.  Thus, 


all  that  is  of  interest  in  p (3 ,  o^)  is  the  information  concerning  3. 

This  is,  of  course,  represented  by  the  marginal  distribution  p(3)»  namely 


(2.23) 


00 

P(P)  =  /p(P  .  o1)  dax 
o 

r  Q(3,  p,  ZX)  Y  “T 

=  const .  <  1  + _ y 

L  2  J 


V1  S1 


Using  (2.23)  as  the  prior  distribution  of  3  and  upon  making  the 
same  assumption  about  the  prior  distribution  of  a  as  in  Section  2.2,  the 
posterior  distribution  of  3  Is  readily  found  to  be: 
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distribution.  As  in  the  case  of  a  multivariate  "normal-t"  distribution, 
the  normalizing  constant  k^  in  (2.24)  is  a  p-dimensional  integral.  This 
may  lead  to  certain  practical  difficulties  in  the  numerical  evaluation  of 
the  posterior  distribution,  particularly  when  p  is  large.  Similar 
difficulties  will  also  be  encountered  if  one  is  interested  in  making 
inferences  about  a  subset  of  the  elements  of  0,  since  in  this  case  it 
does  not  appear  possible  to  express  the  corresponding  marginal  posterior 
distribution  of  the  subset  of  interest  in  terms  of  simple  functions.  In 
the  following  section,  we  develop  a  method  by  which  both  the  posterior 
distribution  in  (2.24)  and  the  marginal  distributions  of  elements  of  p 
can  be  approximated. 

We  note  that  when  the  vector  p  has  only  one  element  (p  -  1)  and 
the  elements  of  the  corresponding  (Ixl)  matrix  X  in  (2.2)  and  (T^xl) 
matrix  X^  in  (2.14)  have  the  same  value,  unity,  the  posterior  distribution 
in  (2.24)  takes  the  following  form: 
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(2.25) 


where 


«iy>  -  4  -  ivwfziL  y  >i1(  m w  x 

(  vi  8i  vs  J 


ili 

2 


/+! 


dp 
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and  the  quantities  y^,  y,  and  s  are  respectively,  the  sample  means  and 

sample  variances  for  the  two  sets  of  experiments.  This  result  corresponds 

a 

to  the  problem  of  making  inferences  aboutApopulation  mean  when  samples  are 
drawn  from  two  normal  populations  with  common  mean  and  unequal  variances. 

It  is  of  interest  to  note  that  the  distribution  given  in  (2.25)  is  exactly 
the  same  as  that  obtained  by  Fisher  (1961a,  1961b)  from  the  fiducial 
theory  point  of  view.  He  proceeded  to  expand  this  distribution  in  an 
asymptotic  series  in  powers  of  and  v,  from  which  probability  integrals 
of  p  can  be  approximated.  We  may  remark  here  that  our  development  in 
Section  3  closely  parallels  Fisher’s  procedure. 

It  is  easy  to  see  that  the  analysis  in  this  section  can  be 
immediately  generalized  to  cover  situations  in  which  several  sets  of 
experiments  are  conducted  sequentially  (or  concurrently)  but  under  quite 
different  conditions.  Suppose  that  the  likelihood  function  for  the  iC^ 
set  of  experiments  can  be  represented  by: 

(2.26)  /( p,  ajy^  =  (aivT27)'Ti  exp  j-  (^j)  (y^p) '  (y^X^^ 

where  i  =  1,  2,  ...,  K  say.  Then,  by  taking  the  cr/ s  as  independent  scale 
parameters  we  obtain  the  following  posterior  distribution  of  p: 
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(2.25) 


where 


>(e|y)  =  kl|l  + 


(v^DCP-yp2 


V1 


l-TLW}^ 

'  (  vsZ  ^ 
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and  the  quantities  y^,  y,  and  s  are  respectively,  the  sample  means  and 

sample  variances  for  the  two  sets  of  experiments.  This  result  corresponds 

a 

to  the  problem  of  making  inferences  about^population  mean  when  samples  are 
drawn  from  two  normal  populations  with  common  mean  and  unequal  variances. 

It  is  of  interest  to  note  that  the  distribution  given  in  (2.25)  is  exactly 
the  same  as  that  obtained  by  Fisher  (1961a,  1961b)  from  the  fiducial 
theory  point  of  view.  He  proceeded  to  expand  this  distribution  in  an 
asymptotic  series  in  powers  of  and  v,  from  which  probability  integrals 
of  3  can  be  approximated.  We  may  remark  here  that  our  development  in 
Section  3  closely  parallels  Fisher’s  procedure. 

It  is  easy  to  see  that  the  analysis  in  this  section  can  be 
immediately  generalized  to  cover  situations  in  which  several  sets  of 
experiments  are  conducted  sequentially  (or  concurrently)  but  under  quite 
different  conditions.  Suppose  that  the  likelihood  function  for  the  itl1 
set  of  experiments  can  be  represented  by: 

(2.26)  /( 3,  ejyp  =  (ai%T27)'Ti  exp  j-  (^~)  (y^p)  ’  (y^X^)! 


where  i  =  1,  2,  ...,  K  say.  Then,  by  taking  the  s  as  independent  scale 
parameters  we  obtain  the  following  posterior  distribution  of  (3: 


(2.27)  pOl y)  -  to  ^  |l  + 
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v  +p 

Q(3t  3t,  Z£|  --V 


with  v.+p 

-  V  A/A  A  A  \  _ ^ 


o'1 

11 

►** 

_ A _ 

r  +  Q«.  -  — 

d3 

R 

l  \\  J 

vi 3 

Ti-P 

h  =  XiXi 

A 

-1 

2  1  .  .A 

A 

Zi  Xiyi 

and  st  =  -  (yi-Xi31) 

'  (y^x^) . 

This  distribution  is  seen 

to  be  the  product  of  K  quantities  each  of  which 

can  be  expressed  as  a  multivariate  t  distribution.  It  may,  therefore,  be 
denoted  as  a  multivariate  "multiple-t"  distribution  and  can  be  approximated 
numerically  using  methods  similar  to  those  described  in  the  next  section. 


III.  ASYMPTOTIC  EXPRESSION  FOR  THE  MULTIVARIATE  "DOUBLE-t"  POSTERIOR 
DISTRIBUTION 

3.1  The  Joint  Posterior  Distribution 

In  the  preceding  section,  we  have  shown  that,  when  and  a  are 
regarded  as  independent  variable  parameters,  the  corresponding  posterior 
distribution  of  3  is  in  the  form  of  the  product  of  two  multivariate  t 
distributions.  [See  (2.24).]  The  normalizing  constant  is  a  p-dimensional 
integral  which  is  in  general  difficult  to  evaluate  even  on  a  fast  computer, 
especially  when  p  is  large.  Nevertheless,  we  now  show  that,  by  expanding 
the  posterior  distribution  into  an  asymptotic  series  in  powexsof  v  *  and 
v^,  we  can  reduce  the  problem  of  integration  to  a  problem  of  evaluating 
the  mixed  moments  of  two  quadratic  forms.  The  same  procedure  is  then 
applied  in  the  next  section  to  obtain  an  asymptotic  expression  for  the 
marginal  posterior  distributions  of  elements  of  3. 


15. 


2  2 

Since  and  s  in  (2.24)  are  known  quantities,  they  can  be 


suppressed  by  setting 


M  -  ZL  and  B  =  -|  Z. 

S1 


We  can  then  write  (2.24)  as: 
(3.1)  pOjy)  =  k 


r  w,  5.  , 

Q(0.  3,  B)'! 
i  + - y 

l  V1  J  1 

v  J 

-  v+P 
T 


with 


h.f{l.2£dL&y'JT  * sftijtt }•  ^  *. 


The  expression  -j  1  + 


^  v+p 

Q(3,  3,  B)  r  T* 


can  be  written: 


{l* 


^  £t£ 

Q<3.  0»  B)  V  2 


|  2  *  exp  Q(3,  3,  B)|  •  exp  Q(3,  p,  B)  - 


v+p 

~2~  log  [  1  + 


Q(3,  g» 


-1 


Expanding  the  second  factor  on  the  right  in  powers  of  v  ,  we  obtain: 
„  v+p 

(3.2)  jl  +  v’  —  |  ■  exp  j-  £  Q(p,  p,  B)|  p£  v*1 


where 


Po=  1 


PL  =  Q2  0,  P,  B)  -  2p  Q(p,  3,  B)] 

P2  -  jg  1 3Q4  (3,  3,  B)  -  4(3p+4)  Q3  (3,  3,  B)  +  12p  (p+2)  Q2(p,3,B)] 


Similarly,  we  have  that 
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(3.3)  jl  +  Q(3,  j  l2  =  exp  j-±Q(0,  0,  H)j  JQ  qt  v"1 

where 

qo  =  1 

qx  =  l  l  Q2  (3,3,M)  -  2p  Q(0,0,M)] 

q2  =  gf^Q4  <3,3,M)  -  4(3p+4)  Q3  (0,0,M)  +  12p  (p+2)  Q2  (0,0, M)] 


Substituting  (3.2)  and  (3.3)  into  (3.1)  and  after  a  little  reduction,  we 
can  express  the  posterior  distribution  as: 


(3.4) 

where 

and 

(3.5) 


P(Ply)  =  «  ^7*  eXP{'^  Q(0’  0’  D)}iio  Pi  qJ  V_1  ViJ 


D  =  B  +  M,  0  =  D*1  (B0  +  K0) 


" '  •»{-* Q(s’  |*}JL  ji  pi  qj v‘l  vIJ  66  ■ 


The  integral  W  in  (3.5)  can  be  integrated  term  by  term.  From 
(3.2)  and  (3.3),  we  see  that  each  term  is,  in  fact,  a  bivariate  polynomial 

A  ~ 

in  the  mixed  moments  of  the  quadratic  forms  Q(0,  p,  B)  and  Q(0,  0,  M) 
where  the  variables  0  have  a  multivariate  normal  distribution  with  mean  0 
and  covariance  matrix  D  For  this  problem,  it  appears  much  simpler  to 
obtain  the  mixed  moments  indirectly  by  first  finding  the  mixed  cumulants. 
It  is  straightforward  to  verify  that  the  joint  cumulant  generating 
function  of  Q(0,  0,  M)  and  Q(0,  0,  B)  is 
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(3.6)  K{tv  t2) 


r  M*  r 

log  J  - exP  |C1  Q(3,3>B)  +  t2  Q(6,3,M) 

R 

i  Q(3,p,D)|d0 


-  -  i  log  |i  -  2D"1  (tx  B  +  t2  M)|  +  tt  Tl[  B  nL  +  t2  TJ^  .  H  T|2 
+  2(tx  B  T)1  +  t2  M  tj2)'  (D  -  2tx  B  -  2t2  M)"1 


(tL  B  nx  +  t2  M  112) 


where  tj^  ■  (3  -  3  and  T)  =  p  -  (5. 

Upon  differentiating  (3.6)  and  after  some  algebraic  reduction,  we  find: 

(see  Appendix) 

(3.7)  k1q  =  tr.  D"1  B  +  t|J  B  tj1 

«01  -  tr.  D'1  M  +  n*  M  n2 

<ra  *  2r+s  1  (r+s-2)  I  -|(r+s-l)  tr.  D  1  GrS  +  (ri^  +  ar\2)'  GrS  (ri^t  sr|2; 
*  Grs  -  st^2  Grs  n2  }  r  +  s  >  2 

where  GrS  =  D(D*1  B)r  (D*1  M)S. 

Employing  the  bivariate  moment -cumulant  inversion  formulae  as 
given  by  Cook  (1951),  the  integral  in  (3.5)  can  be  written  as 

(3.8)  w  =  £i0  jl0  by  v"1  v"J 

where  b  =1 

oo 

b10  =  4  ^20+  K10"  2p  *1(P 
b01  =  4  ^ 'C02+ *01"  2p  *01^ 


18. 


bll  =  I?  ^22+  *20*02+  2<C11+  ** 11*01* 10+  *10*01+  2<2l'{Ol+  2't12*lO 
+  *20*01+  *02* 10*  2p^12+  *21+  *02*10+  *20*01+  ^ll^lO 

+  2*uAroi+  *10*01*  *01*10^  +  4p  **11“  *01*10^ 

b20  =  96  ^*40*  3*20+  4k30I<10+  6*20*10+  *10^  "  4(3p+4) 

<k30+  3*20*10+  *10  ^  +  12P(P+2)(*20+  *10^ 

b02  =  96  ^*04+  3*02+  4k03'C01+  6*02*01+  *013 

•  4(3p+4)('<03+  3*02*01+  *di>  +  12P(p+2>(|toi+  *oi)] 


Substituting  the  results  in  (3.8)  into  (3. 4), we  obtain  the  following 
asymptotic  expression  for  the  posterior  distribution  of  0: 


l°l 


<3-9>  p<ely>  -  exp  {-  i  5.  D>}ii0  jlo  dij v_i  vij 


where  d  =  1 
00 


Pi  '  bi 


01  “  ql  ‘  b01 


^11  =  (pi "bi “h01 )  +  bin  b 


d20  =  P2  '  b20  '  P1  b10  +  b10 
d02  ~  q2  "  b02  ’  ql  b01  +  b01 


10  01  "11 
2 


Expressions  for  additional  terms  d^>  ^21'  d22’  etc*  can  similarly  be 
found  if  desired. 

The  posterior  distribution  is  thus  expressed  in  the  form  of  a 
multivariate  normal  distribution  multiplied  by  a  power  series  in  v^and  v*1. 
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When  both  v  and  tend  to  infinity,  all  terms  of  the  power  series  except 
the  leading  one  vanish  so  that,  in  the  limit,  the  posterior  distribution 
is  multivariate  normal  with  mean  3  and  covariance  matrix  D  .  For 
finite  values  of  v  and  v^,  the  terms  in  the  power  series  can  be  regarded 
as  "corrections"  in  a  normal  approximation  to  the  multivariate  "double-t" 
distribution.  From  (3.2),  (3.3)  and  (3.7),  we  see  that  numerical 
evaluation  of  the  coefficients  in  the  power  series  involves  merely  matrix 
inversions  and  multiplications,  operations  which  are  easily  performed  on 
an  electronic  computer. 

We  note  that  v;hen  the  posterior  distribution  is  a  univariace 
distribution  as  in  (2.25),  the  results  in  (3.9)  are  in  exact  agreement  with 
those  obtained  by  Fisher  (1961b)  in  a  similar  treatment  of  the  problem 
(see  discussion  in  Section  2.6).  In  Fisher's  derivation,  each  term  of 
the  integral  W  in  (3.5)  was  expressed  in  terms  of  the  moments  of  a  univariate 
normal  distribution.  It  can  therefore  be  evaluated  directly  without  making 
use  of  the  mixed -cumulant  formulae  given  in  (3.7)  which  seem  more  convenient 
for  the  multivariate  case  considered  here. 

For  the  univariate  case,  posterior  pr  abilities  can  be  calculated 
using  the  formulae  given  in  Fisher's  paper  cited  above.  When  p  >  1, 
numerical  evaluation  of  joint  probabilities  becomes  exceedingly  cumbersome. 
Nevertheless,  using  the  expression  (3.9)  the  density  function  can  be 
calculated  conveniently.  When  p  s  2,  the  joint  distribution  contours  can 
of  course  be  plotted,  giving  the  investigator  a  complete  summary  of  the 
information  about  3.  This  will  be  illustrated  by  an  example  in  Section  4. 

•fg 

It  should  be  obvious  that  if  one  of  the  v  and  tends  to  infinity 
while  the  other  remains  finite,  the  multivariate  "double-t"  posterior 
distribution  tends  to  the  multivariate  "normal-t"  form.  Our  above 
development  can  easily  be  modified  to  yield  an  asymptotic  expression 
for  the  latter  distribution. 
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3.2  The  Marginal  Posterior  Distribution 


When  interest  centers  on  a  subset  of  the  elements  of  3,  say 
3^  =  (3*.  ....  3^).  an  asymptotic  expression  for  the  corresponding 
marginal  posterior  distribution  can  be  obtained  by  integrating  out  the 
remaining  elements,  =  (3^+*’>  •••>  3^)  from  the  joint  distribution  in 


(3.9).  We  have  that 


1d|2 


(3.10)  p(31|y)  =  - ^  J  exp  j-  4  (Q,  3,  D)}  Jo  ^  dy  v_i  v"j 

(2rt)  R' 


*3, 


Denoting  3  =  (3^ T ^2)  an(*  partitioning  the  matrices  D  and  D  into: 

£  p-£  £  p-£ 


.?n.:..?i2 
_D21  *  °22 


-1 


p-^ 


V  *  V 

.  .11. : . .12 

V  ’  V 
V21  ’  V22 


£ 

p-£ 


we  can  write  the  marginal  posterior  distribution  as: 


(3.11)  P(3jy)  =  ”/2-  expj-  i  Q(P1,  3r  V^j)|f(31|y) 


where 


X 

(3.12)  f(31|y)  =  ----^.^2  /  expj  -  4  Q(32>  a,  D^)}^  fa  dy  v'1  d32 


with  a  =  32  -  ^22  D21  (01  ’  ^l* 


From  the  expressions  for  d  given  in  (3.9),  we  see  that  each  term  in 
the  integral  f(3jjy)  is  a  bivariate  polynomial  in  the  quadratic  forms 


Q(3.  3.  B)  and  Q(3,  3,  M)  where  3^  is  considered  fixed  and  3^  has  a 
multivariate  normal  distribution  with  mean  a  and  covariance  matrix  D22- 
Adopting  the  same  procedure  as  that  described  in  the  preceding  section, 
and  by  setting 
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/\  A  'A 


1 

l 

.w 

_i  p-i_ 

B  = 

.?ii.:.?i2. 

£ 

b"^  = 

.Bii.:.Bi2. 

B21  ‘  B22 

p-i  , 

E21  •  E22 

_i  p-i_ 

_i  P~i_ 

M  = 

,“n.:.“i2. 

£ 

m"1  = 

.?n.:.?i2. 

M21  *  M22 

p-i. 

N21  •  N22 

we  obtain,  for 

A 

the  mixed  cumulants  of  Q(3,  3,  B)  ar»d  Q(3» 

(3.13)  m10 

=  tr. 

°22  B22+  'I 

B22  71 

♦  Q(3j, 

h-  Eu> 

“01 

=  tr. 

°22  M22+  72 

M22  7 2 

♦  (KB,. 

h-  »u> 

i 

p-i. 


£ 

p-i  , 


“rs  =  2C+3"1  <r+s*2>'-  {(r+s-D  tr.  hrs  + 

(f  7X  +  s  72)'  HrS  (r  7l  +  s  72)  -  r  7[  h^8  7Jl  + 

} 


s  7 ^  HrS  7 2 


r  +  s  >  2 


where 


l”  =  D22  (D22  B22)r  (D“J  M22)S 


=  a  -  P2  +  B22  B21  (31  -  P1) 


a  -  P2  +  m22  M21  (f3l  ‘  • 


Using  the  results  in  (3.13),  we  can  express  the  marginal  posterior 
distribution  of  (3^  as: 

|v-l|£ 

(3.14)  P(Slly)  -  -“pr,  «P  {-  i  «B.  B.  *;}>}  j0  ji„  ty  v;1 
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where 

B00  =  1 

B10  *  810  "  bio 

B01  *  801  ■  b01 

B11  ■  gll  ”  bll  ■  310  boi  ‘  801  bio  +  2  boi  bio 

B20  "  S20  ‘  b20  ‘  810  b10  +  b10 

B02  “  S02  ‘  b02  “  801  b01  +  b01 


and  the  quantities  g^  are  functions  of  the  mixed  cumulants  o>  with  the 
functional  relationships  exactly  the  same  as  those  between  b^  and 
shown  in  (3.8). 


It  will  be  noted  that  when  (3^  consists  of  only  one  variable 
(1  =  1),  the  quantities  6^  in  (3.14)  are  simply  polynomials  in  that 
variable.  Employing  the  well  known  expression  for  the  moments  of  a  normal 


variable,  one  can  easily  derive  an  asymptotic  expression  for  the  moments 


of  8^.  In  addition,  probability  integrals  can  also  be  approximated  using 
methods  given  in  Fisher's  previously  cited  paper  (1961b). 
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IV.  AN  ILLUSTRATIVE  EXAMPLE 

To  illustrate  application  of  the  techniques  developed  in  Sections 
2  and  3,  we  analyze  a  very  simple  econometric  investment  model  with  annual 
time  series  data,  1935-1954,  relating  to  two  large  corporations.  General 
Electric  and  Westinghouse.  In  this  model,  price  deflated  gross  investment 
is  assumed  to  be  a  linear  function  of  expected  profitability  and  beginning 
of  year  real  capital  stock.  Following  Griinfeld  (1958),  the  value  of 
outstanding  shares  at  the  beginning  of  the  year  is  taken  as  a  measure  of 
a  firm's  expected  profitability.  The  two  investment  relations  are: 

yr(t)  -  ax  +  31x11(t)  +  32x12(t)  +  ^(t) 

(4-1) 

y2(t)  =  a2  +  61x21(t)  +  32*22(t)  +  c2(t) 


where  t  in  parentheses  denotes  the  value  of  a  variable  in  year  t, 
t  a  1 ,  2 ,  . . . ,  20 ,  and 


Var iable 

General 

Electric 

Westinehouse 

Annual  real  gross  investment 

yx(t) 

y2(t) 

Value  of  shares  at  beginning  of 

year  xu(t) 

x2i<t> 

Real  capital  stock  at  beginning 
of  year 

xi2(t) 

x22(t) 

Error  term 

ex(t) 

e2(t) 

The  parameters  8^  and  P2  in  (4.1)  are 

taken  to  be  the 

same  for  the  two 

firms;  however,  and  <x2  are  assumed 

to  be  different 

to  allow  for  certain 

possible  differences  in  the  investment  behavior  of  the  two  firms.  Further, 
e^(t)  and  «2(t)  are  assumed  to  be  independently  and  normally  distributed 


•Jf 

The  data  are  taken  from  Boot  and  deWitt  (1960) . 
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— _  2  2 
for  all  t  with  zero  means  and  variances  and  a  ,  respectively.  Since 

we  have  no  information  from  which  to  posit  a  relationship  connecting 
2  2 

and  o  ,  we  take  them  to  be  independent  parameters  and  pursue  the 

development  described  in  Section  2.6. 

In  the  present  instance  we  can  regard  either  General  Electric's  or 

Meetinghouse's  data  as  being  generated  "first"  and  derive  a  joint  posterior 

distribution  of  the  relevant  parameters.  This  can  then  serve  to  represent 

prior  information  in  the  analysis  of  the  second  set  of  data.  Or,  with 

locally  uniform  prior  distributions  for  the  parameters  in  both  equations, 

one  can  analyze  both  sets  of  data  at  the  same  time.  In  both  cases  the 

joint 

final  result  is  the  sameA posterior  distribution  for  a^,  a^,  0,  and  0^ 
which  is  in  the  form  of  the  product  of  two  multivariate  t  distributions. 

•Jf 

On  Integrating  out  and  c^,  the  coefficients  0^  and  02  will  be  jointly 
distributed  in  a  bivariate  "double  t"  form.  [See  equation  (2.24).] 
Numerical  values  for  quantities  appearing  in  (2.24)  and  (3.9)  are  shown 
below: 


General  Electric  We3tinghouse 


P] 3 

0.02655 

A 

Pi  » 

0.05289 

A 

<*2  =* 

0.1517 

h  3 

0.09241 

2 

S1  3 

777.4463 

S2  =«  104.3079 

< 

II 

17 

v  « 

17 

4185.1054  299.6748”" 

“9010.5868  1871.1079“ 

M  = 

B  = 

.299.6748  1535.0640. 

.1871.1079  706.3320. 

0  «=  (.0373,  .1446) 


If  one  is  interested  in  the  parameters  and  a2,  it  should  be 
obvious  that,  a  posteriori,  they  are  distributed  in  the  form  of  two  inde¬ 
pendent  t  variables.  In  particular,  the  difference,  -  a2,  has  the 
Behrens-Fisher  distribution. 
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A  plot  of  the  contours  of  the  joint  density  surface  is  shown  in 
Figure  1  along  with  lines  showing  the  loci  of  conditional  modes.  These 
contours  summarize  all  the  relevant  Information  about  the  coefficients 
and  0^.  We  see  tliat  the  posterior  distribution  is  concentrated  rather 
sharply  in  the  region  .0278  <  0^  <  .0468  and  .1216  <  02  <  •  ■'•^6,  mode 

at  about  (.0373,  .1446).  Further,  Pj^  and  02  are  seen  to  be  negatively 
correlated  and  the  contours  are  approximately  elliptical.  The  latter  is 
the  case  because  the  joint  density  function  is  close  to  its  limiting 
bivariate  normal  distribution.  This  arises  from  the  fact  that  in  this 
example  both  and  v  are  rather  large. 

When  interest  centers  on  only  one  of  the  parameters,  say  0^, 
the  expression  in  (3.14)  can  be  employed  to  calculate  the  corresponding 
marginal  distribution.  For  this  example  we  evaluated  (3.14)  disregarding 
terms  for  which  i  +  j  >2.  The  results  are  shown  by  the  solid  curve  in 
Figure  2.  The  broken  curve  in  the  same  figure  represents  the  limiting 
normal  density  function  with  mean  0^  =  .0373  and  variance  v^^  =*  9.01445  x  10  . 

It  will  be  noted  that  the  posterior  distribution  of  0^  is  somewhat  flatter 
at  the  center  and  fatter  in  the  tails  than  its  limiting  distribution. 

Also,  it  is  slightly  skewed.  The  mean  and  variance  of  the  distribution 
of  0^  were  computed  from  (3.14)  neglecting  terms  for  which  i  +  j  >1. 

The  calculation  yielded  the  following  results: 


Limiting  Normal 
Distribution 

Finite  Sample  Corrections 

S10  1  5oi 

Finite  Sample  Mean 
and  Variance  of  0^ 

(D 

(2) 

(3) 

(1)  +  (2)  +  (3) 

Mean  =  .0373 

-.000229 

.000191 

Mean  =  .03726 

Variance  =  9.01445  x  10 

.5985  x  10'5 

.0028  x  10'5 

Variance  *=  9.6158  x  10 

The  mean  of  0^^  is  extremely  close  to  its  asymptotic  value.  On  the  other  hand, 
the  variance  of  01  is  about  6  percent  larger  than  that  of  the  limiting 


distribution. 
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V.  SUMMARY 

In  this  paper,  we  have  adopted  a  Bayesian  approach  to  the  problem 
of  integrating  prior  information  into  the  analysis  of  the  normal  regression 
model.  Initially,  we  reviewed  Jeffrey’s  and  Savage's  analysis  wherein 
prior  knowledge  (or  lack  of  substantial  prior  knowledge)  about  the 
regression  coefficient  3  and  the  logarithm  of  the  scale  parameter  a  is 
represented  by  locally  uniform  distributions.  We  then  turned  to  consider 
a  normal-gamma  representation  of  prior  information  about  3  and  an  additional 
scale  parameter  c  .  Here  we  discussed  three  possible  assumptions  about 
the  two  scale  parameters,  namely,  (i)  cr^  =  kct  with  known  value  of  K  — 
the  Raiffa  and  Schlaifer  case;  (ii)  fixed  and  functionally  independent 
of  a;  and  (iii)  both  cr^  and  a  unknown  and  assumed  independent  a  priori. 

With  assumption  (ii),  we  were  able  to  provide  a  reinterpretation 
of  the  "mixed"  estimation  procedure  of  Theil  and  Goldberger.  It  was 
shown  that  the  posterior  distribution  of  3  takes  the  form  of  a  product  of 
multivariate  normal  and  multivariate  t  distributions. 

Under  the  third  assumption,  we  obtained  what  may  be  regarded  as 
a  generalization  of  Fisher’s  work  on  the  problem  of  making  inferences  when 
samples  are  drawn  from  two  normal  populations  with  common  mean  and  unequal 
variances.  In  this  case,  it  was  shown  that  the  posterior  distribution  of 
3  is  in  the  form  of  the  product  of  two  multivariate  t  distributions.  For 
computational  purposes,  the  distribution  was  expanded  in  an  asymptotic 
series  which  involved  finding  the  mixed  cumulants  of  pairs  of  quadratic 
forms  in  normal  variables.  A  bivariate  example  was  analyzed  in  detail. 
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Appendix 

In  Section  3.1,  we  have  stated  that  the  joint  cumulant  generating 

w  A 

function  of  the  quadratic  forms  Q(p,  3,  M)  and  Q(p,  p,  B)  is  given  by 

(A.  1)  K(tv  t2)  =  -  £  log  |  I-2D_l(t1B+t2H)  |  +  t^'Br^  +  t2n^Mn2 

+  2(t1BTJ1  +  t2Mn2)'  (D-2t1B-2t^0"1  (t1BT)1  +  t2MT]2). 

We  now  derive  the  expressions  for  the  mixed  cumulants  shown  in  (3.7).  In 
our  development,  we  shall  make  use  of  the  following  lemma  the  proof  of 
which  can  be  found,  for  example,  in  Box  (1954). 

Lemma;  Let  P  be  a  nxn  positive  definite  symmetric  matrix  and  Q  be  a  nxn 
nonnegative  definite  symmetric  matrix.  Then,  for  sufficiently  small  i, 
we  have 

log  | I  -  iPQ|  =  -  r|x  ^  tr.  (PQ;  . 

Employing  the  above  lemma  and  for  sufficiently  small  values  of  t^  and  t2,  we 
can  expand  the  first  term  on  the  right  of  (A.l)  into: 

(A. 2)  -  i  log  il-2D'1(t1B+t2M)|  =  tr.  (t^B+t^M)1  . 

The  quadratic  form  t^q^Bq^  can  be  written: 

(A. 3)  t1q[Bq1  -  t]Lq[B(D-2t1B-2t2M)'1  (D-2t1B-2t2M) 

=  t1q|B(D-2t1B-2t2M)*1  Dtj1  -  2tJq[B(D-2t1B-2t2M)”1  Bi^ 

-  2t1t2q[B(D-2t1B-2t2M)'1  Mq^ . 

Similarly, 

(A. 4)  t2q£Mq2  =  t^M^t^-P^M)*1  Dq2  -  2t2qjM(D-2t]LB-2t2M)'1  Mq2 

-  2t1t2q£B(D-2t1B-2t2M)‘1  Mqr 
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Thus,  the  expression  in  (A.l)  becomes 

(A. 5)  K( tv  t2)  =  tr.  (t1D*1B+t2D’1M)'1  +  tln{B(I-2t1D‘1B-2t2D‘1M) 

+  t2n^M(I-2t1D'1B-2t2D*1M)’1  T)2 

-  2t1t2(rll-Ti2),B(I-2t1D'1B-2t2D‘1M)*1  d'VvV  • 

Since  D  =  B  +  M,  it  is  easy  to  see  that  the  matrix  BD  Hi  is  symmetric. 

In  virtue  of  this  property,  we  have 

(A. 6)  (tjD^B  +  t2D_1M)r  =  i|o  tj  t2-i  (D^B)1  (D~ lM)r_i 

and,  for  sufficiently  small  values  of  t^  and  t2> 

(A. 7)  (I-2t1D'1B  -  2t2D‘lM)"1  =  J0  j|0  2l+j  t*  t^  (D~1B>±  (D*^)^ 

Substituting  (A. 6)  and  (A. 7)  into  (A. 5)  and  after  a  little  rearrangement, 
we  find, 


(A. 8)  K(tr  t2>  =  1  +  2r_1  tr.  (D_1B)r  +  ^D(D_1B)r 

+  rfL  2r_1  tX2  j-i  tr.  (D* XM) r  +  n2D(D‘1M)r  T!2| 

+  2  2  2r+S"l  t*t*  {(r+s-1)  tr.  D"lGr8 

r=l  s*l  12  r!s: 

rs  re  IT  8  I 

+  (rT^s^'G  (riij+srip  -  rtj^G  t^-si^G  tj2  J- 


where 


Grs  =  D(D_1  B)r  (D'1  M)S  . 


Upon  differentiating  (A. 8),  we  obtain 
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(A. 9)  Kro  -  2r‘1  (r-l)l  jtr.  (D_1B)r  +  rt|'D(lflB)r  tjl  } 

(A.  10)  KQa  -  29”1  (s-l)!  jtr.  (D'^)8  +  snp(D’1M)8  tj2  | 

(A.  11)  Krs  =»  2r+s_1  (r+s-2) !  j(r+s-l)  tr.  d'V8  +  (rt^+si^) 'G^rT^+si^) 

-  rri|Grsn1  -  |  r.  8  >  1 

which  can  then  be  combined  Into  the  expressions  given  in  (3.7).  note 
that  Box  (1960)  has  derived  expressions  (A. 9)  and  (A. 3.0;  diro.c-rly  from 

A 

the  individual  cumulant  generating  function  of  Q(p,  3,  B)  and  that  of 
Q(3,  3,  M),  respectively. 
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